Modeling Dyadic Data with Binary Latent Factors

نویسندگان

Edward Meeds

Zoubin Ghahramani

Radford M. Neal

Sam T. Roweis

چکیده

We introduce binary matrix factorization, a novel model for unsupervised matrix decomposition. The decomposition is learned by fitting a non-parametric Bayesian probabilistic model with binary latent variables to a matrix of dyadic data. Unlike bi-clustering models, which assign each row or column to a single cluster based on a categorical hidden feature, our binary feature model reflects the prior belief that items and attributes can be associated with more than one latent cluster at a time. We provide simple learning and inference rules for this new model and show how to extend it to an infinite model in which the number of features is not a priori fixed but is allowed to grow with the size of the data. 1 Distributed representations for dyadic data One of the major goals of probabilistic unsupervised learning is to discover underlying or hidden structure in a dataset by using latent variables to describe a complex data generation process. In this paper we focus on dyadic data: our domains have two finite sets of objects/entities and observations are made on dyads (pairs with one element from each set). Examples include sparse matrices of movie-viewer ratings, word-document counts or product-customer purchases. A simple way to capture structure in this kind of data is to do “bi-clustering” (possibly using mixture models) by grouping the rows and (independently or simultaneously) the columns[6, 13, 9]. The modelling assumption in such a case is that movies come in K types and viewers in L types and that knowing the type of movie and type of viewer is sufficient to predict the response. Clustering or mixture models are quite restrictive – their major disadvantage is that they do not admit a componential or distributed representation because items cannot simultaneously belong to several classes. (A movie, for example, might be explained as coming from a cluster of “dramas” or “comedies”; a viewer as a “single male” or as a “young mother”.) We might instead prefer a model (e.g. [10, 5]) in which objects can be assigned to multiple latent clusters: a movie might be a drama and have won an Oscar and have subtitles; a viewer might be single and female and a university graduate. Inference in such models falls under the broad area of factorial learning (e.g. [7, 1, 3, 12]), in which multiple interacting latent causes explain each observed datum. In this paper, we assume that both data items (rows) and attributes (columns) have this kind of componential structure: each item (row) has associated with it an unobserved vector of K binary features; similarly each attribute (column) has a hidden vector of L binary features. Knowing the features of the item and the features of the attribute are sufficient to generate (before noise) the response at that location in the matrix. In effect, we are factorizing a real-valued data (response) matrix X into (a distribution defined by) the product UWV>, where U and V are binary feature matrices, andW is a real-valued weight matrix. Below, we develop this binary matrix factorization

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Analysis of Bayesian Probit Regression of Binary and Polychotomous Response Data

The goal of this study is to introduce a statistical method regarding the analysis of specific latent data for regression analysis of the discrete data and to build a relation between a probit regression model (related to the discrete response) and normal linear regression model (related to the latent data of continuous response). This method provides precise inferences on binary and multinomia...

متن کامل

Topic Modeling for Personalized Recommendation of Volatile Items

One of the major strengths of probabilistic topic modeling is the ability to reveal hidden relations via the analysis of co-occurrence patterns on dyadic observations, such as document-term pairs. However, in many practical settings, the extreme sparsity and volatility of cooccurrence patterns within the data, when the majority of terms appear in a single document, limits the applicability of t...

متن کامل

Local Optimality of User Choices and Collaborative Competitive Filtering

While a user’s preference is directly reflected in the interactive choice process between her and the recommender, this wealth of information was not fully exploited for learning recommender models. In particular, existing collaborative filtering (CF) approaches take into account only the binary events of user actions but totally disregard the contexts in which users’ decisions are made. In thi...

متن کامل

Unified Modeling of User Activities on Social Networking Sites

Social networking sites like Facebook and Twitter are teeming with users and the content posted by them. Several activities like friendship/followership, authoring, commenting on, liking, resharing/retweeting posts typically occur on these sites. In this paper, we make an attempt at the unified modeling of various such activities on social networking sites. We propose a novel joint latent facto...

متن کامل

A guide for multilevel modeling of dyadic data with binary outcomes using SAS PROC NLMIXED

In the social and health sciences, data are often structured hierarchically, with individuals nested within groups. Dyads constitute a special case of hierarchically structured data with variation at both the individual and dyadic level. Analyses of data from dyads pose several challenges due to the interdependence between members within dyads and issues related to small group sizes. Multilevel...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2006

Modeling Dyadic Data with Binary Latent Factors

نویسندگان

چکیده

منابع مشابه

The Analysis of Bayesian Probit Regression of Binary and Polychotomous Response Data

Topic Modeling for Personalized Recommendation of Volatile Items

Local Optimality of User Choices and Collaborative Competitive Filtering

Unified Modeling of User Activities on Social Networking Sites

A guide for multilevel modeling of dyadic data with binary outcomes using SAS PROC NLMIXED

عنوان ژورنال:

اشتراک گذاری